安装CDH5 step by step
Cdh5安装基本上按照官网的手册进行安装。我们所用的系统环境是
win7+VirtualBox-4.3.12+Centos6.5
,下面让我们开始吧:
首先我们如果你装了老版本的Hadoop我们先把他移除,没装就可以无视:
1.暂停hadoop服务:
$ for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x stop ; done
$ for x in 'cd /etc/init.d ; ls hadoop-0.20-mapreduce-*' ; do sudo service $x stop ; done
2.移除hadoop-0.20-conf-pseudo:
$ sudo yum remove hadoop-0.20-conf-pseudo hadoop-0.20-mapreduce-*
安装Java:
- 从oracle上现在下载适合CHD5'.tar.gz'的JDK文件,目前支持的版本是java1.7.0_55.
- Extract the JDK to
/usr/java/jdk-version
; for example/usr/java/jdk.1.7.0_nn
, where is a nn is a supported version. 3.In/etc/default/bigtop-utils
, set JAVA_HOME to the directory where the JDK is installed; for example:1export JAVA_HOME=/usr/java/default
Symbolically link the directory where the JDK is installed to /usr/java/default; for example:
ln -s /usr/java/jdk.1.7.0_nn /usr/java/default
下载CDH5包文件
下载支持CentOS的CDH5版本,然后使用yum命令在本地安装:
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
开始安装
- (Optionally) add a repository key:
1$ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
2.安装Hadoop伪节点模式
$ sudo yum install hadoop-conf-pseudo
启动Hadoop并验证环境
至此,Hadoop的伪节点安装已经完毕,下面我们就开始做一些配置,并启动Hadoop。
-
格式化NameNode 首先切换输入
su
切换到root用户,接着输入命令:$ sudo -u hdfs hdfs namenode -format
第一次使用必须格式化NameNode
-
启动HDFS
for x in
cd /etc/init.d ; ls hadoop-hdfs-*
; do sudo service $x start ; done
为了验证是否启动成功,可以在浏览器里输入地址:http:\\localhost:50070
进行查看,可以看到分布式文件系统的熔炼,数据节点个数,以及日志,在伪分布节点配置下,你只能够看到一个活动的节点localhost
。
3.创建/tmp
,Staging 以及Log的目录:
$ sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate
$ sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
$ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
-
运行下面的指令,来查看文件是否建立:
$ sudo -u hdfs hadoop fs -ls -R /
我们可以看到刚刚在HDFS上建立的目录结构:
drwxrwxrwt - hdfs supergroup 0 2012-05-31 15:31 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /tmp/hadoop-yarn
drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging
drwxr-xr-x - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var/log
drwxr-xr-x - yarn mapred 0 2012-05-31 15:31 /var/log/hadoop-yarn
-
启动YARN(YARN是MapReduce的升级版)
$ sudo service hadoop-yarn-resourcemanager start $ sudo service hadoop-yarn-nodemanager start $ sudo service hadoop-mapreduce-historyserver start
-
创建用户目录 为每个MapReduce 用户创建home目录,例如:
$ sudo -u hdfs hadoop fs -mkdir -p /user/<user> $ sudo -u hdfs hadoop fs -chown <user> /user/<user>
这里我们的用户名是Cdh5,拿他可以替换掉<User>便可以。 至此,我们的环境配置便已经完成,下面我们跑个例子来检验一下。
跑一个YARN的例子
-
首先我们根据上面的例子创建一个Hadoop用户:
$ sudo -u hdfs hadoop fs -mkdir -p /user/joe $ sudo -u hdfs hadoop fs -chown joe /user/joe
2.然后我们通过su joe
切换到用户joe,创建input
目录,并且将几个xml文件复制到该目录下:
$ hadoop fs -mkdir input
$ hadoop fs -put /etc/hadoop/conf/*.xml input
$ hadoop fs -ls input
Found 3 items:
-rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml
-rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml
-rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml
-
设置用户joe的环境变量:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
-
运行mapred的例子。这个例子是从input目录中查找
dfs[a-z.]+
这一正则表达式的匹配字段,命令如下:$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
-
这之后便可以查看output23目录,
$ hadoop fs -ls output23
Found 2 items
drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output23/_SUCCESS
-rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output23/part-r-00000
-
我们的运行结果就在
part-r-00000
文件里,我们可以查看:$ hadoop fs -cat output23/part-r-00000 | head 1 dfs.safemode.min.datanodes 1 dfs.safemode.extension 1 dfs.replication 1 dfs.permissions.enabled 1 dfs.namenode.name.dir 1 dfs.namenode.checkpoint.dir 1 dfs.datanode.data.dir
至此,我们的CDH 5运行环境便搭配完毕,我们可以在该环境下跑hadoop程序了。